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(57) Abstract 



An ATM switch architecture expandable to multi-terabits/s uses data transfer in a heterogeneous burst of a constant length. It employs 
rotators connecting stages in a three-stage switch configuration. In one embodiment, the cells are sorted at ingress and a matching process 
is performed between the first and middle stages. The switch is simple to control and has high performance at both the call and cell levels. 
It also meets the basic requirements that cells be delivered in the proper order, and that the rate of any individual connection be as high as 
the inlet-port rate. With a small internal expansion, the switch is non-blocking in the sense that any bit-rate acceptable to both the inlet and 
outlet ports will be guaranteed a path through the core. This feature is particularly useful in services which may require frequent bit-rate 
change during the connection time. 
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HIGH CAPACITY ATM SWITCH 

Technical Field and Industrial Applicability 

The invention generally relates to ATM switches. In particular, 
5 it is directed to high capacity ATM switches which use rotators and 
common memory modules. 

Background Art 

Traditional ATM switches are primarily cell-synchronous. The 

10 two most popular configurations used in large-scale switching nodes 
are the buffer-space-buffer and the three-buffer-stage networks. Fig. 1 
shows a typical buffer-space-buffer network. An NxN single-stage space 
switch interconnects N asynchronous multiplexers to N asynchronous 
demultiplexers. Priority queuing may be provided at the inlet modules 

15 in order to control the qua lity-of-ser vice (QOS) of traffic streams 
belonging to different classes. The inlet multiplexers and outlet 
demultiplexers may be paired to form a folded architecture with intra- 
module switching. With asynchronous multiplexing at inlet, this 
configuration requires a fast mechanism for contention resolution. In 

20 the classical buffer-space-buffer architecture, arbitration to resolve 

multiple simultaneous demand for a given outlet is done on a cell-by- 
cell basis, requiring a fast mechanism. 

Figure 2 depicts a known plain three-buffer-stage configuration. 
This configuration does not have a contention problem, thanks to the 

25 extra buffering stage, but has some capacity limitations. In the 

architecture of Figure 2, each component is an nxn common-memory 
(CM) or output-buffered (OB) switch; n is typically 16 or so and there 
are P middle modules. With P=n, the total capacity is limited to n 2 
times the link speed r. In the folded architecture which will be 

30 described below, the capacity limit is (1/2) n 2 times the link speed r. 

The cells of a given connection, between different outer modules, must 
be routed through the same intermediate switching module in order to 
guarantee proper cell order. Priority service can be implemented at one 
or more stages. 

35 As mentioned above, the cell-synchronous switches have 

capacity limitations. U.S. Patent No. 5,475,679 issued on December 12, 
1995 (Munter), describes a switch architecture suitable for very high- 
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speed networks. The design was guided by two main principles. The 
first is to transfer multiple cells, padded by a reasonable guard time, to 
circumvent the high speed cell synchronization problem. The second 
is to sort the incoming cells at ingress to facilitate internal routing and 
5 congestion control within the switch. The multiple cells, hereafter 
called bursts, must belong to the same egress port, and the burst length 
could vary significantly from one cell to a hundred cells or so. The 
bursts are transferred directly from inlet to outlet through an optical 
space switch and a central controller is used to realize a collision-free 
transfer. As such, the switch capacity is limited mainly by the speed of 
the controller. 

Specifically speaking, the classical buffer-space-buffer architecture 
has a single input buffer (perhaps per class), and the destination 
information is only stored in the cell headers. As shown in Figure 3, in 
the architecture of the above co-pending application the cells are sorted 
according to destination, thus facilitating the contention resolution 
task. In Figure 3, the common buffer of each inlet module (inlet to 
space switch) is divided into a number of variable length sections. The 
number of sections is N or less, depending on the number of inlet 
modules. Priority service can be implemented by a further subdivision 
of each section according to the number of classes per destination. The 
inter-module payload transfer is based on requests and grants. An inlet 
module which has cells to send to an outlet module must signal its 
intention to do so. The control system decides the time of the load 
transfer and the number of cells in each transaction. The load is 
transferred in the form of homogeneous bursts; a homogeneous burst 
contains cells of the same destination as depicted in Figure 4. The idle 
slots shown in Figure 4 represent the inter-burst guard time. This 
gives rise to two possibilities: a centralized-control protocol, or a 
protocol based on distributed control. The architecture in the 
copending application is based on central control. An inlet module 
makes a request by simply indicating the required destination and the 
number of cells in the current load. This information is sent through a 
control bus accessed periodically (or by any other suitable discipline) by 
the central controller. Cell order is naturally preserved since requests 
are processed one at a time. The highest individual connection rate 
equals the inlet-port rate, for example 600Mb/s or so. The capacity of 
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the switch is limited primarily by the controller speed. Even with a 
dedicated processor per inlet controlling the traffic flow, excessive 
delays would occur when the number of destinations is large. With a 
relatively small number of inlet (outlet) modules, N=16 for example, 
5 the cell delay performance is excellent. Also, the inlet-buffer 

requirement is quite modest at relatively high traffic loads. Thus, an 
infinitesimal cell-loss is realizable with a reasonable buffer size. 

In U. S. Patent No. 5,168,492 (Beshai et al), issued December 1, 
1992, rotating access ATM/STM packet switches are described which are 
10 functionally equivalent to the classical buffer-space-buffer architecture. 
In the basic embodiment, it uses middle packet buffers with a rotator 
(commutator) at its input and output 

The use of burst transfer, optical rotators, and distributed control 
facilitates the construction of high capacity switches using lower 
15 capacity modules. According to the present invention, a significant 
capacity increase can be realized if rotators are used and several 
controllers operate simultaneously on non-overlapping inlet-outlet 
pairs. This can be achieved in a simple manner if the condition that a 
burst must contain cells of the same destination is relaxed, and if the 
20 bursts are of equal size. In the architecture of Figure 3, the bursts are 

homogeneous (i.e., all the burst cells have the same destination) and of 
variable length, as shown in Figure 4. The control can be enhanced if 
the bursts are heterogeneous and of equal size as shown in Figure 5. A 
heterogeneous burst may contain cells of different destinations. The 
25 invention therefore uses the concepts described in the above- 
referenced copending patent application and the rota ting-access idea of 
U.S. Patent No. 5,168,492 to construct a switch with an ultimate capacity 
of several tera bits/s. The maximum connection rate, which is the 
permissible rate for a single user, is the inlet port speed. 

30 

Objects of the Invention 

It is an object of the invention to provide a high capacity ATM 
switch which employs rotators in a three-stage configuration and 
transfers data in heterogeneous bursts of a predetermined length. 
35 It is another object of the invention to provide a method of 

switching data in heterogeneous bursts of a predetermined length. 



SDOCID: <WO 9716004A1J_> 



WO 97/16004 



PCT/CA96/00673 



4 

It is a further object of the invention to provide a high capacity 
ATM switch which uses matching of cells between inlet buffers and 
middle buffers. 

It is yet another object of the invention to provide a method of 
5 switching data in heterogeneous bursts of a predetermined length 
which includes a step of matching cells between the inlet and middle 
stages. 

It is still another object of the invention to provide a high 
capacity ATM switch which is internally non-blocking. 

10 

Disclosure of the Invention 

Briefly stated, according to one aspect, the invention relates to a 
high capacity ATM switching system for switching data in a burst of a 
predetermined number of cells among N inlet modules and M outlet 

15 modules in each successive access time, M and N being positive 

integers. The switching system comprises the N inlet modules having 
buffers, each buffer dedicated to each of the outlet modules, for storing 
cells according to the destination outlet modules of the cells in 
respective buffers and P common memories, P being a positive integer, 

20 each common memory having M memory sections, each of which is 
able to hold at least said predetermined number of cells and is 
dedicated to each outlet module. The switching system further 
includes an inlet rotator for cyclically connecting in each access time 
the N inlet modules and P common memories so that respective cells 

25 are transferred from the N inlet modules and stored in respective 

sections according to the destination outlet module of each cell, and an 
outlet rotator for cyclically connecting in each access time the P 
common memories and M outlet modules so that respective outlet 
modules are connected to respective memory sections for reading out 

30 cells contained therein. 

According to another aspect, the invention is directed to a 
method of switching data in a burst of a predetermined number of cells 
among N inlet modules and M outlet modules in each successive 
access time, M and N being positive integers. The method comprises 

35 steps of each of the N inlet modules storing cells in separate buffers 
according to the destination outlet modules of the cells and cyclically 
connecting the N inlet modules and P common memories, P being a 
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positive integer. The method further includes steps of transferring in 
each access time the burst of the predetermined number of cells from 
one of the N inlet modules to respective memory sections of one of the 
common memories according to the destination outlet modules of the 
5 cells, and cyclically connecting the common memories and M outlet 
modules so that respective outlet modules are connected to the 
respective memory section for reading out cells contained therein. 

Brief Description of the Drawing s 
10 Figure 1 is a known buffer-space-buffer switch; 

Figure 2 is a known three-stage switch; 

Figure 3 shows a buffer-space switch with inlet sorting and burst 
transfer; 

Figure 4 shows variable-length homogeneous bursts; 
15 Figure 5 shows constant-length heterogeneous bursts; 

Figure 6 is a three-stage rotator-linked switch according to one 
embodiment of the invention; 

Figure 7 is a three-stage switch with inlet sorting, burst transfer, 
and distributed control according to another embodiment of the 
20 invention; 

Figure 8 shows the operation of rotating-access to middle CMs 
according to the invention- 
Figure 9 illustrates the matching process of the invention; 

Figure 10 shows the simultaneous matching process; 
25 Figure 11 shows yet a further embodiment of the invention 

which uses a ring configuration; 

Figure 12 is a switch system in folded architecture; 

Figure 13 is a common memory multiplexer/ demultiplexer 
switch; 

30 Figure 14 is a k 2 xk 2 rotator requiring 2k units of smaller kxk 

rotators; 

Figure 15 is a k 3 xk 3 rotator requiring 3k 2 units of smaller kxk 
rotators; 

Figure 16 is a graph showing inlet-buffer occupancy distribution; 

35 and 

Figures 17, 18 and 19 are graphs showing cell-delay 
complementary functions under different conditions. 
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Mode(s) of Carrying Out the Invention 

Figure 6 shows schematically a switching architecture according 
to one embodiment of the invention. In this embodiment, N inlet 
modules 30 and N outlet modules 32 are linked by two optical rotators 
5 34, 36 and P middle modules 38. Each inlet module receives data from 
n inputs 40 and sends multiplexed data to rotator 34 through a serial 
link 42. Each outlet module accepts multiplexed data from rotator 36 
and demultiplexes them to n outputs 44. N, P and n are any positive 
integer numbers. It is also possible to have different numbers of inlet 
10 and outlet modules. In this embodiment, the inlet, middle and outlet 
modules are made of several common memory modules and 
designated CMo-CMjvj.1 and CMo-CMp.]. Each rotator is a kxk rotator, 
k>l which is a periodic selector, equitably connecting each of its inputs 
to each of its outputs. In other words, it is a counter-driven kxk 
15 selector. It functions as k parallel sets of k serial links (a total of k 2 

links). With identical inlets of speed, e.g., x b/s each, the speed of each 
link is x/k b/s. These links are hereafter called "virtual links". They 
are called virtual because they are reconfigurable. A virtual link 
connects an outer CM to a middle CM during a fixed interval of several 
20 time slots (a time slot is the cell duration). This interval is called the 
"access time", denoted A. 

A burst of cells of possibly different destinations is transferred 
from an inlet CM to a middle CM per access time. The highest 
individual-connection rate is r*(n/P), where n is the number of 
external ports per CM, P is the number of middle CMs and r is the 
speed of an external port. This configuration works almost exactly like 
the three-stage switch shown in Figure 2, the only difference being that 
the links from a given outer module to the set of middle modules are 
fast and intermittent in this architecture as compared to slower and 
continuous in the standard architecture of Figure 2; hence the need for 
burst transfer. 

This architecture does not require control communications 
between the stages. The capacity is virtually unlimited. The overall 
cell delay variance, however, may be unacceptable for CBR (constant bit 
rate) and other delay-sensitive traffic. This problem can be solved by 
appropriate path selection for the virtual circuits at the call-admission 
stage and by providing priority classification, at least at the middle CMs. 
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Basically, the internal-routing mechanism should distribute the delay- 
sensitive traffic equitably among the middle CMs, where they are given 
high transfer priority to the output modules. There are different traffic 
classifications such as the CBR, the VBR (variable bit rate), and the 
5 ABR (available bit rate). An individual connection, regardless of 
classification, must use the same middle CM in order to maintain 
proper cell sequence. This requirement limits the highest connection 
rate per user to r*n/P. For example, if r=620 Mb/s, n=16, and P=256, the 
capacity of the switch is approximately 2.5 Tb/s but the highest 
10 connection rate is less than 40Mb/s (620x16/256). Thus, the price of 

high capacity is a reduced upper bound of individual connection rates. 

The same high capacity, but with an individual connection rate 
as high as the external port speed, is realized with further controls as 
will be described below in connection with a further embodiment of 
15 the invention. 

Figure 7 shows such an embodiment which vises a matching 
process. During a rotator cycle, each inlet module 50 visits each middle 
CM 52. The access time, denoted A (slots), during each visit is fixed. At 
a rotator port speed of lOGb/s, for example, a value of A=16 corresponds 
20 to about 0.7 jisec. During each access time, a number of cells belonging 
to one or more outlet modules is transferred. A guard time of one or 
two cells may be needed within each access time. The productive time 
of the access time is hereafter called the duty cycle, and the number of 
cells per duty cycle is denoted D. A control array within each middle 
25 CM stores the number of cells destined to each of the N outlet 

modules. There are N inlet and outlet modules in this embodiment 
but unequal numbers are possible. During each access time, each outlet 
module reads the cells destined to it and resets the corresponding entry 
of the control array to zero. The maximum number of cells read per 
30 access time is D (14, for example, if A is chosen to be 16 cells and a guard 
time of two cells is used). The admission of cells to the middle CMs is 
based on a matching process. The capacity is dependent on the size of 
the rotator and is virtually unlimited. There is a constant delay from 
each inlet to each outlet. This delay varies from one access time (of the 
35 order of 1 ^isec) to N access times, but is constant for the same inlet- 
outlet pair. For example, with 16 OC12 (optical carrier, about 620 Mb/s) 
ports per inlet CM module, a time slot (ATM cell duration) at the 
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optical rotator port is about 40 nsec. Selecting an access time of 16 slots 
(about 0.7 (isec), the worst constant delay in a large switch with 256 
middle CM modules (2.5 Tb/s capacity) is less than 200 \isec. 

During each access time (of 16 slots duration for example), each 
5 inlet module transfers a burst of cells to a middle module. The 
number of transferred cells is limited by the duty cycle, which is 
defined as the connection period (in cell times) minus the guard time 
(one or two cells, for example). In a 16 slot access time with 2 slot guard 
time, the duty cycle is 14. The cells may belong to many outlet 
10 modules. 

Figure 8 shows the operation of the middle CMs 60 in an 8x8 
switch (N=8), each row representing a middle CM which is logically 
divided into 8 sections, each of which corresponds to an outlet module. 
A section is as wide (again only logically) as the duty cycle (14 cells, for 

15 example). During an access time, inlet module 6 in Figure 8 is storing 
cells, in the top CM, destined to outlet modules 1, 2, 4, and 7. Each inlet 
module may write in different sections during the access time, after 
which the rotator moves to the next position. However, the accessing 
outlet module can only read whatever is found in its dedicated (logical) 

20 section. Thus outlet module 6 reads only cells stored in section 6 of 
each row as the rotator moves around. 

The composition of the burst is determined through a simple 
matching process, as depicted in Figure 9. Each inlet module keeps an 
array 70 of the number of waiting cells per destination and each middle 

25 module keeps an array 72 of the number of free slots per destination. 
The two arrays are matched in a cyclic order. As designated by 74, the 
number of cells accepted is the lesser of the number of waiting cells and 
the number of free slots for each destination inspected, the total being 
limited by the duty cycle. 

30 Figure 10 shows the simultaneous transfer of bursts from N inlet 

modules to N middle modules (P=N). In the Figure, inlet modules 0, 
1, N-2, N-l are accessing middle modules 1, 2,..., N-l, 0. The logical 
order of rotation needs not follow the physical port order, i.e. the order 
can be preset in any way at the rotators. Cell order is preserved since 

35 the middle buffers are visited sequentially by both the inlet 

multiplexers and the outlet demultiplexers. The maximum number of 
cells to be stored in any middle CM is D times N, where D is the 
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number of cells per duty cycle. The number of cells actually stored in 
the middle memory varies according to traffic load composition and 
the cell arrival pattern. 

A direct method for performing the matching process of Figures 
5 9 and 10 is to let each inlet module send to the middle module, which 
it will access during the subsequent access time, an array of N words of 
d-bit each, e.g., an array designated by 80, where d=[log 2 (D)], D being 

the duty cycle and |".~| denotes rounding-up to the nearest integer. The 
maximum number of cells that can be transferred to any destination 

10 equals the number of cells in the duty cycle. Thus, with N=256 ports 
and D=14 (i.e., d=4), the number of bits transferred per access time is 
1024 (approximately 2.4 cells). Each middle module then responds with 
a grant message 82 indicating the selected destinations, and the 
permissible number of cells for each. The maximum number of bits in 

15 the grant message is D (v + d) where v=[log 2 (N)~|. (There are at most D 

selected outlet modules per grant; v bits store the outlet module 
number and d bits store the number of cells per selected outlet 
module.) In the above example (N=256, D=14), the maximum message 
length is 168 bits. The ratio, 8, of the grant message overhead to the 
20 switch capacity is: 

flog 2 (N)]^riog 2 (D)l 

where B is the number of bits per ATM cell (B=424). With N=256 and 
D=14, G is 0.028. 

It is noted that the control-data transfer can be done in the 

25 reverse order; the middle modules may send their state information to 
the inlet module which performs the matching process. 

This overhead (the volume of control data) can be reduced 
significantly by another embodiment described below. In this 
embodiment, the inlet modules send their inlet buffer states to middle 

30 modules every several access times and lets each middle module pass 
the inlet data, modified by the matching outcome, to the following 
unit. In other words, when middle module Y receives the state array 
from inlet module X, Y performs the matching process, reduces the 
inlet-state array according to the outcome of the matching process, and 

35 passes the reduced array to the next middle module Y+l (modulo N), 
which will be accessed by the same inlet module X during the 
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subsequent access time. The matching process must be implemented 
within the access time A. It is also noted that the ratio {(A/DM} is the 
expansion needed to realize an internally non-blocking switch (e.g., 
16/14). 

5 Figure 11 illustrates such a mechanism, using a ring 

configuration. An inlet module 90 sends its buffer state information to 
middle module 92, e.g., CMo, to which it will access for data transfer at 
the next access time. After having performed a matching process, CMo 
updates the buffer state of the inlet module 90 and sends the 
10 information to the following module e.g., CMi, which performs the 
matching process with the updated buffer state of the inlet module 90 
when the inlet module 90 accesses middle module CMi for data 
transfer. The further updated buffer state information is sent to the 
next middle module and so on for e.g., four access times (as shown in 
15 the example of Figure 11), at which time inlet module 94 sends its 
buffer state information to middle module e.g., CM X . 

This mechanism is tolerant to long propagation delays from 
inlet to middle modules, otherwise propagation delays exceeding the 
access time may complicate the control function due to the inter- 
dependence of successive matching processes. 

Sending the inlet state data every L access times, L>1, would 
reduce the corresponding control data volume by a factor r\: 

rio g2 (L.p)i 

Lflog 2 (D)]' 

The performance of the switch is quite insensitive to this artificial 
delay, and one may choose to send the inlet update every 16 or 32 access 
times. With L=32 and D=14, the factor r] is 0.0703. Note that ti=1 when 
L=l. The ratio, e, of the control data overhead from the inlet modules 
to the middle modules to the switch capacity is: 
Nriog 2 (L.D)] 
BLD 

where B is number of bits per ATM cell (B=424). With L=32, D=14, and 
N=256, e is only 0.012. 

The variable delay encountered in traversing the inlet and 
middle CMs is negligible for all traffic streams. Hence, priority 
classification is not necessary in either the inlet stage or the middle 
stage. It is noted, however, that priority service may be needed at 



WO 97/16004 




PCT/CA96/00673 



11 

egress, i.e., in the outlet stage. The egress performance is similar to that 
of a single-stage CM switch and is not discussed here. 

The relevant performance indices here are the grade-of-service 
(GOS), determined mainly by the call-admission blocking and the 
5 quality-of-service (QOS), which is determined by the cell loss and/or 
cell transfer delay. The cell delay and cell loss contribution of the 
switching network of the present invention is at least an order of 
magnitude smaller than the contribution of the egress stage. Thus the 
overall performance is comparable to that of the (ideal) single stage 

10 switch, under similar traffic conditions. 

The performance issues at the call and cell levels will be 
discussed below. Call-level blocking applies mainly to CBR and VBR 
traffic streams where the admission of a new arrival depends on its 
declared traffic descriptors and, hence, a calculated "effective bit rate 

15 (EBR)". The EBR value is determined by the cell-level performance 
(cell loss and/or cell delay variation). The multiplexing of the lower 
rate traffic at the inlet modules into a high speed stream, and the 
subsequent distribution among the middle CMs, have very little effect 
on the overall cell delay which is determined mainly by the outlet 

20 occupancy. The internal variable delay is negligible since each inlet 
module is free to transfer cells through any middle module for any 
virtual circuit In addition, as seen in Figure 16 which will be described 
later, the modest cell storage requirement at the inlet modules 
eliminates cell loss as a major concern. Thus, the EBR calculation can 

25 be based on standard methods applicable to single-stage output buffered 
or common memory switches. Like the output buffered or common 
memory switches, the switches of the present invention, with zero 
guard time, can be treated as non-blocking at the call level. In other 
words, the admission, or otherwise, of an arrival is determined only by 

30 the state of the designated outlet. 

With a non-zero guard time, the link capacity is somewhat 
reduced. It is customary, however, to allow some internal expansion 
where the inner links are of a slightly higher speed than the outer 
links. The expansion is provided to facilitate internal flow control and 

35 it does not affect the traffic performance of single-stage switches. The 
expansion has the added benefit of offsetting the guard time overhead. 
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By definition, a switch is considered non-blocking if the blocking 
of an incoming request is determined solely by the designated outgoing 
link. Because of link blocking, the call-level occupancy of an outlet 
port, i.e., the sum of the EBRs of the calls in progress divided by the 
5 port bit rate, fluctuates around its mean value below unity. The outer 
links would occasionally be in the state of full call level occupancy. To 
realize an acceptable call blocking (0.01 for example), the outer links 
may be engineered for a mean call level occupancy of 0.8 or so, 
depending on the traffic composition. The mean cell level occupancy 
10 is lower than the mean call level occupancy since the EBR for a VBR 
connection is always higher than the mean bit rate of the connection. 
The simulation results which will be discussed later are based on a 
pessimistic mean cell level occupancy of 0.80, and the internal 
expansion is assumed to be zero. With a typical expansion of 0.1 or so, 
15 the delay variation would be appreciably smaller. 

Cell level performance is normally expressed in terms of the cell 
loss probability and the cell delay variation. The cross office round trip 
delay, traditionally specified for circuit switches to be less than one 
millisecond or so, is still applicable to ATM switches. 
20 The cross office round trip delay for the proposed switch is a 

constant which is equal to the rotator cycle duration. In a 256 port 
switch, with a port speed of 10 Gb/s, the rotator cycle is about 175 jisec 
with a 16 slot access time, or 88 |isec with an access time of 8 slots. The 
round trip delay is the sum of the delay from inlet port x to outlet port 
25 y plus the delay from inlet port y to outlet port x. The two components 
are not equal, and each varies from one access time A to (N-l) A. A 
long x-y delay corresponds to a short y-x delay, and the sum is 
constant. 

The cell-delay variation is the more critical performance index 
30 since it determines the size of the smoothing buffers used for CBR 
connections. The switch of the invention (with 5 to 10 Gb/s rotator 
links) yields a delay dispersion, at the 10 th quantile, well below the 
commonly accepted bound of 250 |isec at the chosen reference load. 

According to yet a further embodiment, the folded arrangement 
35 depicted in Figure 12 may be used in the configurations of Figures 2, 3, 
6 and 7. Each CM module serves as a combined multiplexer- 
demultiplexer-switch as in Figure 13. An nxn CM module is operated 
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as an j : 1 multiplexer, al : ^ demultiplexer, and an ^ : ~ switch (for 

an even number of external ports n). The effective internal expansion 
ratio is increased due to the intra-switching facility. 

Since the multiplexing function requires very little storage 
5 capacity, the common memory capacity of each combined inlet-outlet 
module is used mainly for egress queuing. A major advantage of this 
configuration is that only the inter-module traffic would have to 
traverse the middle stage. This results in reducing the rate of cell 
transfer across the middle stage and hence decreasing the contention 
10 delay. 

The set of virtual links connecting the outer CMs and the 
middle CMs is realized as a simple rotator. According to further 
embodiments of the invention, large rotators may be constructed using 
smaller size rotator units of size kxk each (k>l) by cascading banks of 

15 small units operating at different speeds, that is to say, an k h xk h rotator 
k=l,2,..., can be built by using h rows of k h_ l smaller rotators of size kxk 
each. The innermost units must switch ports every A slots, where A is 
the desirable access-time. The units of the second bank must switch 
ports every kA slots. Figure 14 shows a two-stage configuration which 

20 extends the capacity to k 2 xk 2 . A third bank, whose units switch ports 
every k 2 A, extends the capacity to k^xk 3 as shown in Figure 15. For 
example, a 256x256 rotator requires 32 rotators of size 16x16 arranged in 
two rows of 16 units each (here k=16 and h=2). In the configuration of 
Figure 15, with k=16, a 4096 x 4096 rotator can be constructed with 768 

25 units (three rows h=3 of 256 units) of 16x16 rotators. It is interesting to 
note that a non-blocking space switch of the same size would require 
65536 units of 16x16 space switches arranged as a square. Unlike the 
space switch, the rotator's operation is cyclic and traffic independent. 

30 Simulation Results 

A simulator for the proposed architecture was developed and 
used to study the performance of switches of different sizes ranging 
from N=8 to N=256, with both the folded architecture (with intra- 
switching in the outer modules) and the unfolded architecture (with 
35 no intra switching). The number of cells processed in each case is about 



SDOCID: <WO 9716004A1 J_> 



WO 97/16004 




PCT/CA96/00673 



14 



2.5 x 10 8 . The inlet module buffer occupancy and the variable cell delay 
are shown for a port mean cell-occupancy of 0.80. 

The traffic arriving at an inlet module port (of OC12 rate, for 
example) is a multiplex of traffic streams generated by several sources. 
5 The traffic generated by each source is assumed to be very bursty, with a 
large ratio of peak rate to mean rate. Using the ON-OFF model with 
geometrically distributed "ON" and "OFF" periods, the multiplexed 
traffic at the inlet module port is assumed to have a mean burst length 
of 20 (implicitly, the individual sources would have much larger burst 
10 lengths at their own peak rates). The composite traffic, at the rotator 
port rate of 10 Gb/s or so, is much less bursty. As shown in the 
simulation results below, the dispersion of the delay of cell transfer to 
the output stage is quite small. The egress process at the output stage 
(demultiplexing to slower ports) contributes most of the delay and is 
15 affected by burstiness in the same way as a single-stage CM switch. 

The simulation results were derived for spatially-balanced 
traffic. Results obtained for several cases with high spatial imbalance 
(large variance of traffic intensity for different inlet-outlet pairs) show 
negligible sensitivity to the spatial traffic distribution as long as the 
20 overall load for each outlet port remains unchanged. 

Figure 16 shows the inlet buffer distribution for the case of access 
time A of 16 cell intervals (slots) with a guard time y of 2 cell intervals. 
For a cell loss of the order of 10' 7 , a buffer size of less than 35 cells 
suffices. 

25 Figure 17 shows the distribution of the variable cell delay (cell 

delay complementary function) in the folded architecture. The access 
time A is kept constant at 16 cell intervals for the different switch sizes. 
The guard time is 2 and the cell interval is 0.08 seconds. The delay 
increases with the increase of the number of ports due to the increase 

30 in the rotator cycle. 

Figure 18 shows the delay performance (cell delay 
complementary function) in the folded architecture when the access 
time A is reduced to 8 slots and the guard time is reduced to one slot. 
The cell interval is 0.08 (iseconds. The idle (guard) time remains 

35 proportionately the same as in the case of Figure 17, however, the delay 
performance improves due to the reduced access time. 
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Figure 19 shows the delay performance (cell delay 
complementary function) in the case of the unfolded architecture (no 
intra-module switching). The access time A is 16 cells, the guard time is 
2 cells and the cell interval is 0.04 |isecond. 
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WHAT IS CLAIMED IS: 

1. In a rotating access high capacity ATM switching system for 
switching data among N inlet modules and M outlet modules in each 
5 successive access time in a burst of a predetermined number of cells, M 
and N being positive integers, the invention being characterized in 
that: 

said N inlet modules having buffers, each buffer dedicated to 
each of said outlet modules, for storing cells according to destination 
10 outlet modules of said cells in respective buffers; 

P common memories, P being a positive integer, each common 
memory having M memory sections, each of which is able to hold at 
least said predetermined number of cells and is dedicated to each outlet 
module; 

15 an inlet rotator for cyclically connecting in each access time said 

N inlet modules and P common memories so that respective cells are 
transferred from said N inlet modules and stored in respective sections 
according to the destination outlet module of each cell; and 

an outlet rotator for cyclically connecting in each access time said 

20 P common memories and M outlet modules so that respective outlet 
modules are connected to respective memory sections for reading out 
cells contained therein. 



2. The high capacity ATM switching system according to claim 1 
25 wherein the inlet module comprises a buffer which is logically 

partitioned to buffer sections corresponding to said M output modules. 

3. The high capacity ATM switching system according to claim 2 
wherein said inlet and outlet rotators are kxk rotators making k 

30 simultaneous connections, k being an integer larger than 1. 

4. The high capacity ATM switching system according to claim 3 
further comprising a matching mechanism for matching cells stored in 
the buffers of the inlet modules and free slots in the sections of the 

35 common memories according to the destinations of the predetermined 
number of cells in a burst. 
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5. The high capacity ATM switching system according to claim 4 
wherein the inlet modules send inlet control data, to the matching 
mechanism, concerning the number of cells stored in their buffers 
according to the destinations of the predetermined number of cells; 
the common memories send memory state data, to the matching 
mechanism, concerning the number of free slots available in the 
sections; and the matching mechanism sends grant signals to the inlet 
modules for the number of cells to be transferred according to the 
destinations of the predetermined number of cells. 



6. The high capacity ATM switching system according to claim 4 
further comprising: 

each inlet module sending inlet control data to one of the 
common memories in every several access times; 
15 a ring controller connecting said P common memories in a ring 

configuration; 

each common memory having a matching mechanism for 
sending to each inlet module the grant signals during each access time 
and updating the inlet control data as a result of the grant signals; and 
20 each common memory sending the updated inlet control data to 

the following common memory in the ring. 

7. The high capacity ATM switching system according to claim 5 
wherein M=N=P. 

25 

8. The high capacity ATM switching system according to claim 6 
wherein M=N=P. 

9. The high capacity ATM switching system according to claim 7 
30 wherein each of the input and output rotators comprises h tandemly 

connected sets of (kxk) rotators where h is a positive integer and 



k= 



2 
M h 



in that the number of said (kxk) rotators in each set is k and 



one set operates k times faster than the other set. 

35 10. The high capacity ATM switching system according to claim 

8 wherein each of the input and output rotators comprises h tandemly 
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connected sets of (kxk) rotators where h is a positive integer and 
k=| M* j in that the number of said (kxk) rotators in each set is k and 
one set operates k times faster than the other set. 

5 11. In a rotating access high capacity ATM switching system for 

switching data among N inlet modules and M outlet modules in each 
successive access time, M and N being positive integers, a method of 
switching data in a burst of a predetermined number of cells among N 
inlet modules and M outlet modules in each successive access time 
10 being characterized in that: 

each of said N inlet modules storing cells in separate buffers 
according to the destination outlet modules of said cells; 

cyclically connecting said N inlet modules and P common 
memories, P being a positive integer; 

transferring in each access time the burst of said predetermined 
number of cells from one of said N inlet modules to respective 
memory sections of one of said common memories according to the 
destination outlet modules of said cells; and 

cyclically connecting said common memories and M outlet 
modules so that respective outlet modules are connected to a 
respective memory section for reading out cells contained therein. 

12. The method of switching data in a burst of a predetermined 
number of cells according to claim 10, wherein the steps of cyclically 

25 connecting comprises steps of: 

cyclically making k simultaneous connections in each access 
time between said inlet modules and said common memories, and 
between said common memories and M outlet modules, k being an 
integer larger than 1. 

30 

13. The method of switching data in a burst of a predetermined 
number of cells according to claim 12 further comprising steps of. 

matching cells stored in the buffers of the inlet modules and free 
slots in the sections of the common memories according to the 
35 destination of the predetermined number of cells in a burst. 



20 
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14. The method of switching data in a burst of a predetermined 
number of cells according to claim 13, further comprising steps of: 

matching inlet control data and memory state data, the former 
concerning the number of cells stored in their buffers according to the 
destinations of the predetermined number of cells and the latter 
concerning the number of free slots available in the sections; and 

sending grant signals to the inlet modules for a number of cells 
to be transferred according to the destinations of the predetermined 
number of cells. 



15. The method of switching data in a burst of a predetermined 
number of cells according to claim 14 wherein said P common 
memories are connected in a ring configuration, the method further 
comprising steps of: 
15 each inlet module sending inlet control data to one of the 

common memories in every several access times; 

each common memory matching the inlet control data and 
memory state data and sending to each inlet module the grant signals 
during each access time; 
20 each common memory further updating the inlet control data as 

a result of the grant signals; and 

each common memory sending the updated inlet control data to 
the following common memory in the ring. 
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